15 research outputs found

    Discovery and recognition of motion primitives in human activities

    Get PDF
    We present a novel framework for the automatic discovery and recognition of motion primitives in videos of human activities. Given the 3D pose of a human in a video, human motion primitives are discovered by optimizing the `motion flux', a quantity which captures the motion variation of a group of skeletal joints. A normalization of the primitives is proposed in order to make them invariant with respect to a subject anatomical variations and data sampling rate. The discovered primitives are unknown and unlabeled and are unsupervisedly collected into classes via a hierarchical non-parametric Bayes mixture model. Once classes are determined and labeled they are further analyzed for establishing models for recognizing discovered primitives. Each primitive model is defined by a set of learned parameters. Given new video data and given the estimated pose of the subject appearing on the video, the motion is segmented into primitives, which are recognized with a probability given according to the parameters of the learned models. Using our framework we build a publicly available dataset of human motion primitives, using sequences taken from well-known motion capture datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields including video analysis, human inspired motion generation, learning by demonstration, intuitive human-robot interaction, and human behavior analysis

    Vision-based deep execution monitoring

    Full text link
    Execution monitor of high-level robot actions can be effectively improved by visual monitoring the state of the world in terms of preconditions and postconditions that hold before and after the execution of an action. Furthermore a policy for searching where to look at, either for verifying the relations that specify the pre and postconditions or to refocus in case of a failure, can tremendously improve the robot execution in an uncharted environment. It is now possible to strongly rely on visual perception in order to make the assumption that the environment is observable, by the amazing results of deep learning. In this work we present visual execution monitoring for a robot executing tasks in an uncharted Lab environment. The execution monitor interacts with the environment via a visual stream that uses two DCNN for recognizing the objects the robot has to deal with and manipulate, and a non-parametric Bayes estimation to discover the relations out of the DCNN features. To recover from lack of focus and failures due to missed objects we resort to visual search policies via deep reinforcement learning

    Rigid tool affordance matching points of regard

    Get PDF
    In this abstract we briefly introduce the analysis of simple rigid object affordance by experimentally establishing the relation between the point of regard of subjects before grasping an object and the finger tip points of contact once the object is grasped. The analysis show that there is a strong relation between these data, in so justifying the hypothesis that people figures out how objects are afforded according to their functionality

    Bayesian non-parametric inference for manifold based MoCap representation

    Get PDF
    We propose a novel approach to human action recognition, with motion capture data (MoCap), based on grouping sub-body parts. By representing configurations of actions as manifolds, joint positions are mapped on a subspace via principal geodesic analysis. The reduced space is still highly informative and allows for classification based on a non-parametric Bayesian approach, generating behaviors for each sub-body part. Having partitioned the set of joints, poses relative to a sub-body part are exchangeable, given a specified prior and can elicit, in principle, infinite behaviors. The generation of these behaviors is specified by a Dirichlet process mixture. We show with several experiments that the recognition gives very promising results, outperforming methods requiring temporal alignment

    Component-wise modeling of articulated objects

    Get PDF
    We introduce a novel framework for modeling articulated objects based on the aspects of their components. By decomposing the object into components, we divide the problem in smaller modeling tasks. After obtaining 3D models for each component aspect by employing a shape deformation paradigm, we merge them together, forming the object components. The final model is obtained by assembling the components using an optimization scheme which fits the respective 3D models to the corresponding apparent contours in a reference pose. The results suggest that our approach can produce realistic 3D models of articulated objects in reasonable time

    Visual search and recognition for robot task execution and monitoring

    Full text link
    Visual search of relevant targets in the environment is a crucial robot skill. We propose a preliminary framework for the execution monitor of a robot task, taking care of the robot attitude to visually searching the environment for targets involved in the task. Visual search is also relevant to recover from a failure. The framework exploits deep reinforcement learning to acquire a "common sense" scene structure and it takes advantage of a deep convolutional network to detect objects and relevant relations holding between them. The framework builds on these methods to introduce a vision-based execution monitoring, which uses classical planning as a backbone for task execution. Experiments show that with the proposed vision-based execution monitor the robot can complete simple tasks and can recover from failures in autonomy

    Towards an understanding of human activities: from the skeleton to the space

    No full text
    In this thesis is described the reasearch undertaken for the Ph.D. project in Computer Vision, having the main objective to tackle human activity recognition from RGB videos. Human activity recognition from videos aims to recognize which human activities are taking place during a video, considering only cues directly extracted from video frames. The related applications are manifold: healthcare monitoring applications, such as rehabilitation or stress monitoring, monitoring and surveillance for indoor and outdoor activities, human-machine interaction, entertainment etc.. An important disambiguation has to be exposed before proceeding further: the one between action and activity. Actions are generally described in literature as single person movements that may be composed of multiple simple gestures organized temporally, such as walking, waving or and punching. Gestures are instead elementary movements of a body part. On the other hand, activities are described as involving two or more persons and/or objects, or a single person performing complex actions, i.e. a sequence of actions. Human activity recognition is one of the main subjects of study of computer vision and machine learning communities since a long time, and it is still an hot topic due to its complexity. A challenging task is to develop a system for human activity recognition, due to well-known computer vision problems. Body parts occlusions, light conditions, and image resolution are only a subset of this problems. Furthermore, similitudes between activity classes make the problem even harder. Activities in the same class may be exhibited by distinct persons with distinct human body movements, and activities in different classes may be hard to discriminate because they may be constituted by analogous information. The way in which humans execute an activity depends on their habits, and this drives the challenge of detecting activities quite difficult. The main consideration coming out deeply analyzing the available literature for activity recognition, is that an activity recognition robust system has to be context-aware. Namely, not only the human motion is important to achieve good performances, but also other relevant cues which can be extracted from videos have to be considered. The available state of the art research in computer vision still misses a complete framework for human activity recognition based on context, taking into account both the scene where activities are taking place, objects analysis, 3D human motion analysis and interdependence between activity classes. This thesis describes computer vision frameworks which will enable the robust recognition of human activities explicitly considering the scene context. In this thesis are described the main contributions for context-aware activity recognition regarding 3D modeling of articulated and complex objects, 3D human pose estimation from single images and a method for activity recognition based on human motion primitives. Four major publications will be presented, together with an extensive literature review concerning computer vision areas such as 3D object modeling, 3D human pose estimation, human action recognition, human action recognition based on action and motion primitives and human activity recognition based on context. Future work concerning the undertaken research will be to build a complete system for activity recognition based on context, exploiting the several frameworks introduced so far

    Single image object modeling based on BRDF and r-surfaces learning

    No full text
    A methodology for 3D surface modeling from a single image is proposed. The principal novelty is concave and specular surface modeling without any externally imposed prior. The main idea of the method is to use BRDFs and generated rendered surfaces, to transfer the normal field, computed for the generated samples, to the unknown surface. The transferred information is adequate to blow and sculpt the segmented image mask in to a bas-relief of the object. The object surface is further refined basing on a photo-consistency formulation that relates for error minimization the original image and the modeled object

    Human motion primitive discovery and recognition

    No full text
    We present a novel framework for the automatic discovery and recognition of human motion primitives from motion capture data. Human motion primitives are discovered by optimizing the 'motion flux', a quantity which depends on the motion of a group of skeletal joints. Models of each primitive category are computed via non-parametric Bayes methods and recognition is performed based on their geometric properties. A normalization of the primitives is proposed in order to make them invariant with respect to anatomical variations and data sampling rate. Using our framework we build a publicly available dataset of human motion primitives based on motion capture sequences taken from well-known datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields related to Robotics including human inspired motion generation, learning by demonstration, and intuitive human-robot interaction
    corecore